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ABSTRACT 

Rare  special  populations  for  which  no  list  exists  require  costly 
screening.  Efficient  procedures  are  discussed  in  this  paper  for  reducing 
these  screening  costs,  if  the  populations  are  geographically  clustered. 
These  procedures  involve  telephone,  face-to-face  screening  and  mixed 
modes.  If  the  special  population  is  not  geographically  clustered,  multi- 
plicity procedures  may  be  useful. 
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EFFICIENT  SCREENING  METHODS  FOR 
THE  SAMPLING  OF  SPECIAL  POPULATIONS 


1.  Introduction 


A  special  population  is  defined  as  a  subgroup  of  a  general  population 
for  which  no  complete  list  is  available.  Such  populations  require  screen- 
ing of  the  general  population  before  data  collection.  If  the  population 
is  rare  or  very  rare,  screening  costs  may  become  very  large  and  account  for 
the  major  share  of  costs  of  data  collection.  Procedures  are  available, 
however,  for  reducing  these  screening  costs. 

There  are  many  special  populations  that  are  geographically  clustered. 
A  few  examples  are: 

a.  Racial  and  ethnic  groups  such  as  Blacks,  Hispanics,  Cuban, 
Vietnamese  and  recent  immigrants  from  the  Soviet  Union. 

b.  Special  tyoes  of  housing  such  as  very   large  apartments,  trailers, 
substandard  housing,  etc. 

c.  Employees  in  specified  industries  such  as  asbestos  workers. 

d.  Purchasers  of  new  products  with  limited  distribution. 

Other  special  Dopulations  are  not  geographically  clustered.  Examples 
of  these  are: 

e.  Cancer  patients 

f.  Alcholics 

g.  Accident  victims 

In  this  paper  some  alternative  procedures  for  reducinq  screening 
costs  are  discussed.  Two  dimensions  are  considered: 

a.  Geographic  clustering  -  If  the  special  population  is  geographically 


clustered,  costs  may  be  reduced  substantially  by  rapidly  identifying 
zero  segments  in  which  no  members  of  the  special  population  can  be  found, 
or  by  undersampling  clusters  with  very   few  members  of  the  special  popu- 
lation.  If  the  special  population  is  not  geographically  clustered, 
multiplicity  procedures  may  be  useful. 

b.  Methods  of  screening  and  data  collection  -  Screening  and  data 
collection  may  be  conducted  using  mail,  telephone  or  face-to-face  pro- 
cedures and  mixed  methods  are  possible  with  screening  being  done  in  a 
less  exDensive  way  than  interviewing. 

The  standard  procedure  for  screening  such  special  DODulations  is 
to  increase  the  size  of  the  cluster.  See,  for  example,  Kish  (1965, 
pp.  904-10).  Thus,  if  a  special  population  is  p  percent  of  the  total 
population  and  a  cluster  size  of  n  completed  cases  is  optimum,  initial 
clusters  of  n/p  would  be  required. 

2.  Sampling  When  There  Are  Many  Zero  Geographic  Segments 

Many  geographically  clustered  special  copulations  are  located  in  a 
limited  number  of  geographic  areas.  Conversely,  there  are  a  large  fraction 
of  total  geographic  segments  in  which  no_  members  of  the  special  poDulation 
are  located.  The  standard  procedure  in  this  case  often  leads  to  hundreds 
of  screening  calls  in  these  zero  segments  and  no  eligible  resoondents. 

If  the  zero  segments  are  known  in  advance  from  Census  data  or  some 
other  source,  substantial  cost  savings  are  possible  by  eliminatinq  the 
screening  of  these  zero  segments.  In  this  case,  the  optimum  Drocedures 
for  screening  and  data  collection  using  mail  or  telephone  procedures 
involve  no  clustering  while  the  optimum  procedures  for  face-to-face 
interviewing  would  utilize  the  standard  optimum  cluster  procedures 


-3- 


developed  by  Hansen,  Hurwitz  and  Madow  (1953).  Frequently,  however, 
zero  segments  are  not  known  in  advance. 

Zero  segments  unknown  in  advance  -  The  rapid  elimination  of  zero 
segments  throuqh  use  of  a  one  (or  a  few)  screeninq  contacts  can  sub- 
stantially reduce  screening  costs,  particularly  if  the  proportion  of  zero 
segments  is  high.  The  widely  used  method  for  improvinq  the  efficiency  of 
random  digit  dialing  procedures  described  by  Waksberg  (1978)  may  be 
adapted  for  special  populations  with  even  greater  cost  savings  than  for 
general  populations. 

The  procedure,  as  used  in  either  mail  or  telephone  screeninq,  reauires 
that  initially  a  single  unit  be  sampled  within  a  geographic  segment.   If 
that  unit  is  a  member  of  the  special  population,  additional  screeninas 
are  conducted  in  the  segment  until  the  desired  cluster  size  is  reached. 
This  procedure  produces  a  sample  in  which  each  selected  unit  has  an  equal 
probability  of  selection. 

Generally  following  Waksberg' s  notation,  let: 

n  =  the  total  sample  size  of  the  special  population 

m  =  the  number  of  geographic  senments 

k+1  =  the  cluster  size  in  the  sample 

p  =  homogeneity  (intraclass  correlation)  within  the  geographic  segments 

t  =  proportion  of  segments  with  no  members  of  population  (zero 
segments) 

n  =  proportion  of  special  population  to  total  population 

C<-  =  cost  of  screening  a  contact 

C,  =  cost  of  data  collection  from  an  eligible  unit 

Cp  =  Total  costs  of  this  method 


Waksberg  shows  that  the  expected  screening  costs  for  this  procedure 
are: 

|  [l+(l-t)k]cs  (2.1) 

Total  costs  for  screening  and  interviewing  are: 

mr 


CD  =  m(k+l)CT  +  -n  +  (l-t)klCc 
r  1    tt  o 


=  m(k+l)[cT  +  Cc-^^-1  +  mt  Cc  (2.2) 

I    S  it      —  5 

TT 

It  therefore  follows  immediately  from  Hansen,  Hurwitz  and  Madow  (1953, 
Ch.  6)  that: 


'optimum  =  f  _  tCS      '^  1 

U[cI+  (JzDc,^  P^J 


IT 

The  cost  for  a  sample  of  size  n  is  found  by  substituting  the  value  of 
k+1  from  formula  (2.3)  into  formula  (2.2).  The  cost  for  a  nonclustered 
sample  with  the  equivalent  variance  is 


r  _  m(k+l)  r    r 
LU  '  1+kp  I  I   IS 

The  ratio  of  costs  from  formulas  (2.2)  and  (2.4)  is: 


(2.4) 


Cp  =  (1+kp)     [CjTr  +  C$(l-t)   +  tCs/k+l]  (2.5) 

S  [Cjir  ?  Csj 

Table  2.1  presents  optimum  values  of  the  function  k+1  /  ( — -) 2  for 
values  of  t,  tt,  and  CT  /  C<-.  Table  2.2  presents  relative  data  collection 
costs  for  this  optimum  clustering  as  compared  to  unclustered  samples.  It 
may  be  seen  that  very  substantial  cost  savings  of  about  70  percent  or  more 
are  possible  in  the  upper  left  hand  corner  of  the  table  when  t  is  around 
.9,  tt   is  correspondingly  low,  and  p  is  around  .01.  On  the  other  hand,  there 


-5- 

is  no  advantage  to  these  clustered  screening  methods  in  the  lower  right 
hand  corner  where  t  is  less  than  .5  or  .6,  -n   is  greater  than  .2  and  p 
is  about  .10. 

Two  examples  illustrate  the  effectiveness  of  qeographic  screening. 
For  both  examples,  it  is  assumed  that  telephone  sampling  and  interviewing 
are  used  and  that  virtually  all  eligible  households  have  telephones. 
Example  2.1:  Phone  Screening  of  Black  Households  Using  Random  Digit 
Dialing 

Suppose  one  wishes  to  select  a  national  phone  sample  of  1,200  Black 
households.   (Such  a  sample  is  currently  being  used  to  obtain  Black 
attitudes  on  public  policy  and  marketing  issues.)  The  proportion  of  Black 
households  in  the  United  States  is  about  .12  so  that  the  estimate  of  tt 
is  (.12)  (.30)  =  .036.   (The  .30  estimate  of  working  phone  numbers  is 
based  on  Groves  and  Kahn  (1979)  estimates  that  65  percent  of  banks  of  100 
numbers  are  nonworking  and  on  the  additional  estimate  that  about  20 
percent  of  numbers  in  working  banks  are  not  working  household  numbers. 

Based  on  an  analysis  of  some  Census  Block  Statistics,  it  is  estimated 
that  about  70  percent  of  working  banks  have  no  Black  households.  Thus, 
t  =  l-(.35)(.30)  =  .9.  Assume  that  o  =  .05,  Cj  =10  and  C$  =  $2.  Then: 


(k+1 )       =fi  Q     \  95 

0ptimum     (.036)5  +  T  705  1 


'  p  ' 

The  actual  cost  for  an  unclustered  samnle  with  an   equivalent  variance  is: 


or  using  Table  2.1  and  interDolating,  ~-  %   =  1.82  and  k  +  1  =  (1.82)  (4.36)  =  8, 

(iZ£.) 

p 


cu  =  vWk\[]n  +  yk^-  m>272 


2.1  Optimum  Values  of    k+1 


for  Values  of  t,  it,   Ct/Cc 


[(!• 

-P)/p)r 

X        o 

t              C     'C 

.025 

.05 

IT 

.10 

.20 

.30 

.40 

.50 

.95              2 

3.08 

2.52 

_ 

_ 

_ 

_ 

_ 

5 

2.33 

1.79 

- 

- 

- 

- 

- 

.90             2 

2.45 

2.12 

1.73 

- 

- 

- 

- 

5 

2.00 

1.60 

1.22 

- 

- 

- 

- 

.80             2 

1.79 

1.63 

1 .41 

1.15 

- 

- 

- 

5 

1.57 

1.33 

1.07 

.82 

- 

- 

- 

.70             2 

1.41 

1.32 

1.18 

1.00 

.88 

- 

- 

5 

1.28 

1.13 

.94 

.73 

.62 

- 

- 

.60             2 

1.15 

1.10 

1.00 

.87 

.77 

.71 

- 

5 

1.14 

.96 

.82 

.65 

.56 

.50 

- 

.50             2 

.95 

.91 

.85 

.75 

.67 

.62 

.58 

5 

.89 

.82 

.71 

.58 

.50 

.45 

.41 

.25             2 

.56 

.54 

.51 

.47 

.43 

.40 

.38 

5 

.53 

.50 

.45 

.38 

.33 

.30 

.28 

* 
To  find  optimum  k+1; if  p= 

.  1 ,  multi 

;ply  by 

3; 

if  p  =   .05, 

multiply  by  4 

.36;   if  c 

>=.02  mi 

lit 

iply  by  7; 

if  p  =   .01, 

multiply  by  9 

.95;  etc. 
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For  the  optimum  cluster  design,  the  cost  is: 

C     =   1200   [10  +  -^U-l  +  ^4§>    (-9)    (2)   =   18667  +  7500 
p  .036   -        .036 

=  $26167 
and  the  ratio  Cp  =  .44  which  could  also  be  obtained  from  Table  2.2 

Example  2.2  Phone  Survey  of  American  Jews 

It  is  estimated  that  Jews  account  for  about  three  percent  of  the  total 
U.S.  population  and  are  heavily  concentrated  in  the  largest  cities.  For 
this  example,  the  following  estimates  are  made: 

tt  =  (.03)  (.35)  -  .01 

t  =  1  -(.35)  (.4)  =  .86 

C,  =  $10,  Cs  =  $2,  p  =  .02 

Then  (k+1)  ..    =15  and  from  formula  (2.5),  for  any  samDle  size 
optimum  ~  • 

Cp  =  1.28  r.1  +  2  (.14)  +  .36  (2)715.  =  .30 

/-       •  .1+2 

LI 

Clusters  with  too  few  eligible  households  -  As  illustrated  by  the  last 

example,  optimum  cluster  sizes  may  be  fairly  large  for  relatively  rare 

population  screening.  The  procedure  becomes  biased  if  a  selected  cluster 

contains  fewer  than  k+1  members  of  the  snecial  population.  Two  procedures 

are  possible  to  prevent  this  bias: 

a.  Increasing  the  size  of  the  cluster  from  banks  of  100  to  banks  of 
several  hundred  or  more. 

b.  Weighting  the  data  in  a  cluster  by  the  ratio  k+l/r  where  r  is  the 
number  of  eligible  households  within  the  cluster.  ODtimum  procedures 
would  usually  involve  weighting.  This  is  discussed  in  Section  7. 


3.  Face-to-Face  Screening  Required 

For  some  purposes,  such  as  determininq  the  condition  of  housinq 
units  and  interviewing  in  those  that  meet,  or  fail  to  meet,  presDecified 
criteria,  or  in  ethnic  groups  with  low  phone  coverage,  it  may  be  necessary 
to  conduct  face-to-face  screening.  The  cost  function  for  this  process  is 
strongly  affected  by  the  cost  of  listing  and  travel  to  and  from  the  segment. 
Thus,  the  procedure  used  for  mail  and  phone  surveys  of  selecting  non-zero 
segments  on  the  basis  of  the  characteristics  of  a  single  unit  may  no 
longer  be  optimum.  Rather,  it  may  be  more  efficient  to  conduct  multiple 
screening  calls  before  deciding  whether  to  include  or  exclude  a  cluster. 

Consider  the  following  desiqn.  An  interviewer  makes  ,j  screening 
calls  at  points  within  a  cluster  that  are  relatively  close  geographically. 
(For  specificity,  we  can  assume  that  these  might  be  housing  units  on 
different  blocks  within  the  same  Census  Tract.)  If  the  screening  call 
yields  an  eligible  household,  then  additional  calls  are  made  sequentially 
until  k  additional  eligible  households  are  located.  If  the  screening 
call  does  not  yield  an  eligible  household,  no  additional  screening  calls 
are  made.  This  is  an  unbiased  sampling  procedure  which  is  a  direct 
extension  of  the  method  of  the  previous  section. 

Note  that  this  procedure,  unlike  the  one  discussed  for  phone  and  mail 
screening,  produces   variability  in  the  total  number  of  completed  cases  in 
a  cluster.  The  number  of  completed  cases  will  ranqe  from  k+1  to  j  (k+1). 
This  variability  increases  the  sampling  variance  as  does  the  clusterina. 
Even  so,  it  may  be  shown  that  in  some  cases,  this  procedure  is  optimum. 
Using  the  sane  notation  as  in  the  previous  section: 

Let  CT  =  travel  costs  for  one  trip  to  an  averaoe  seament. 
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it is  assumed  that  the  segments  have  been  listed  oreviously  so  that 

no  additional  listing  costs  are  required. 

Cp  =  total  costs  of  face-to-face  screening  and  inter- 
viewing procedures. 

Assume  that  j  screening  calls  can  be  made  on  a  single  trip  so  that 
no  additional  travel  costs  are  required.  These  j  calls  will  yield  0-j 
eligible  units.  For  each  eligible  unit  found,  continue  screening  until 
k  additional  eligible  units  are  found.  It  is  assumed  that  one  additional 
trip  is  required  for  interviewing  in  a  segment  once  an  eligible  unit  has 
been  found.  More  complex  cost  functions  describing  travel  may  be  used, 
but  the  general  result  would  still  follow. 

It  can  be  demonstrated  that  for  most  applications  this  is  more 
efficient  than  either  conducting  only  a  single  screeninq  call/segment  or 
conducting  a  very   large  number  of  screening  calls  in  every   segment.  We 
use  the  same  cost  function  as  previously,  adding,  however, a  term  CT  for 
travel  to  screen  the  cluster.  Let  CF  be  the  total  costs  for  this  face- 


to-face  procedure. 


Then  Cp  =  m(k+l)  [Cj  =  C«.(1-t)]  +  mfc  C$  +  ^(^J  (3-D 


(k+1) 


ODt 


t  Cs  +  CT  (ir+1)   fl-p 
Cj  if  +  (1-t)  Cs 


(3.2) 


The  value  of  multiple  starts  depends  on  the  fact  that  the  homogeneity 
between  elements  in  a  cluster  typically  declines  as  the  cluster  increases 
in  geographic  size.   If  this  is  not  the  case,  then  the  new  cost  function 
merely  means  an  increase  in  optimum  cluster  size.  This  is  immediately 
evident  since  the  numerator  in  formula  (2.7)  has  a  C_  term,  but  is  otherwise 
identical  to  the  ohone  optimum  in  formula  (2.3). 
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Example  3.1 

As  an  illustration,  suppose  the  sample  of  Black  households  in  Example 
2.1  required  a  face-to-face  interview  because  of  the  topic.  Let  ir  =  .12, 
t  =  .7  (now  working  banks  are  irrelevant)  C,  =  $10,  C<-  =  $4  and  CT  =  $16, 
p  =  .05.  Note  that  Cs  is  increased  since  face-to-face  screening  is  more 
costly  than  telephone  screening.  From  formula  3.2, 


(k+1) 


opt. 


".7(4)  +  16(1.12)  (.% 
10(.12)  +  .3(4)   1.05 


=  12.  R 


Now  suppose  the  homogeneity  p' between  blocks  within  a  tract  is  low, 
say  .01  while  p  within  the  block  is  .05.  Assuming  no  additional  cost  for 
screening,  it  is  possible  to  compute  alternative  values  for  j  and  compare 
to  a  single  screening  call.  For  specificity  assume  that  m  =  100  tracts.  ■  Then 

cF  =  100(12.8)  [in  +  a   (^1)]  +  ion  [-^(4)  +  16  (I^|)l 

=  <540,767 

_     = n 

neguiv   [1  +  p'(j-l  )T1  +  p(k)l 


(3.3) 


it 


•  ir+1 


CF  =  mj(k+l)[CT  +  C«.    (-^-)]  +  m!"^  Cc  +  CT(^-)1(3.4) 
r  I  5        ir  it       S  it 


(k+1) 


opt 


tCS  +   C_J  (1+ir)   fl- 

j 

CjTT     +      (l"t)Cs 


.-.1 


(3.5) 


Consider  value  of  j  =  2,3,...  For  alternate  values  of  j  and  p',  optimum 
cluster  sizes  and  costs  may  be  determined  and  the  equivalent  sample  sizes 
compared  to  the  equivalent  sample  size  for  j  =  1.  This  is  done  in  Table 
3.1.  As  an  illustration,  consider  j  =  3,  p  =  .05,  p1  =  .1. 


Then  (k+1) 


opt 


7(4)  +  ^  (1.12)  f.95\] 


L10(.12)  +  .3(4) 


.05 


-3.3 
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40,767  

From   (3.4)  n  =   3(s>3)[10  +  A   (^_}]  +  HZZgH  +  16~?^f)~  =  56"7 

so  n  =  mj    (k'    +  1)   =  1411 

-  1411  _   R71 

nequiv  "   [1   +   .1(2)[1   +   .05(7)]  "   S71 


3.1     Equivalent  Sample  Sizes  for  Values  of  j  and  p'    in  Example  3.1 

Equivalent  n 

p' 


Actual  n 

.01 

.05 

.10 

.20 

.30 

1 

1280 

805 

805 

805 

805 

805 

2 

1350 

990 

952 

909 

833 

769 

3 

1411 

1025 

950 

871 

7*7 

4 

1458 

1049 

5 

1483 

1056 

6 

1505 

1062 

7 

1515 

1059 

For  the  general  population,  p'  is  usually  in  the  range  of  .05  -  .2. 
For  rare  populations,  p'  could  be  smaller.  Unfortunately,  usually  one 
would  not  know  this  until  after  the  study.  A  conservative  approach  would 
be  to  use  general  population  values.  For  this  example,  j  =  2  would  be 
optimum  over  a  broad  range  of  p'.  While  this  result  depends  on  the 
specific  cost  function  and  value  of  p,  additional  calculations  suggest 
that  two  or  three  starting  points  per  segment  would  be  better  than  one 
for  most  face-to-face  screenings,  unless  p'  is  large.  As  p  increases 
relative  to  p',  or  for  p'  very  small,  even  more  startinn  points  are 
optimum.   It  is  not  clear  that  many  such  situations  exist. 
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4.  The  Use  of  Incomplete  Lists 

Even  incomplete  lists  may  be  very  useful  in  identifying  areas  where 
the  special  population  is  located.  In  the  simplest  case,  assume  that  a 
random  (or  systematic)  sample  of  starting  points  is  chosen  from  the  list 
and  screening  continues  at  each  starting  point  until  k  additional  eligible 
respondents  are  located.  It  is  evident  that  this  procedure  is  almost 
identical  to  those  just  discussed. 

A  cluster  will  have  an  initial  probability  of  selection  proportional 
to  the  number  of  members  of  the  special  population  in  it.  Then,  sampling 
within  the  cluster  is  inversely  proportional  to  this  probability  so  that 
the  ultimate  sample  is  self-weighting. 

The  sample  is  not  unbiased,  however,  if  there  exist  non-zero  clusters 
that  have  no_  eligible  respondents  on  the  incomplete  list.  These  clusters 
have  no  chance  of  selection.   It  is  possible  to  measure  the  sample  bias 
from  such  a  procedure  if  one  has  an  estimate  of  the  total  size  of  the 
special  population.  One  would  also  estimate  from  the  list  the  number  of 
nonzero  clusters  and  from  the  screening  the  average  number    eligible 
per  cluster.  The  product  of  these  last  two  is  an  estimate  of  the  number 
of  persons  in  the  special  population  who  have  a  nonzero  probability  of 
selection.  The  difference  between  the  total  estimate  and  the  estimate 
of  those  with  nonzero  probabilities  would  indicate  potential  bias. 

The  cost  function  for  using  lists  is  very  similar  to  those  already 
discussed. 

Let  C,  be  the  cost  for  a  unit  on  the  list.  Then  the  total  cost 


C..  =  m(k)rcT  +  (^  Ccl  +  m  TCT  +  c.  1        (4.1) 


kopt 


-14 
CT  +  C 


c,  ?  ,k  cs  ^>]  <*-2> 


The  cost  per  case  using  optimum  clustering  is: 

C 


r    +   H-t)r  ,  L  /4  ^\ 


and  the  cost  per  an  equivalent  unclustered  case  is: 

C 

-T, 
'opt- 


CUE  -  [1*pM  [C,  +  U&-  Cs  +  ^ J  (4.4) 


Example  4.1 

Suppose,  following  Example  2.2  that  a  list  were  available  of  Hadassah 
members  and  one  wished  to  sample  American  Jews  in  a  telephone  survey.  Assume 
that  the  list  was  estimated  to  include  clusters  in  which  95  percent  of  Ameri- 
can Jews  live.  The  cost  of  the  list  C,  is  104  for  each  member. 

As  before,  let,  Cj  =  10,  Cs  =  2,  p  =  .02,  it  =  .01,  t  =  .86. 


r  io  t  -1        as  1  * 

Then  k_=L0  +  44   (2)    Hf)J       =3.6 


^opt     UO  +  ^f  (2)    \02 

Note  that  since  screening  costs  are  small,  much  smaller  clusters  are  optimum. 
The  cost  per  case  is  10  +  J4y(2)  +  j^r  =  $38. 02  and  per  equivalent  case  is 
[1  +  .02  (3.6)][38.02]  =  $40.76,  about  2/3  of  the  costs  of  phone  screening. 

5.  Use  of  Combined  Methods 

The  use  of  combined  procedures  is  common  in  survey  sampling.  The 
standard  optimization  methods  allocate  sampling  rates  to  the  procedures 
inversely  to  the  square  root  of  the  ratios  of  costs.  Bayesian  allocation 
procedures  result  in  the  elimination  of  very  costly  procedures  when  the 
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reduction  in  variance  would  be  small  and  resources  are  limited. 

In  the  screening  of  special  populations,  one  would  be  very  unlikely 
to  use  combined  methods  if,  as  in  Example  4.1,  the  list  is  95  percent 
complete,  since  the  marginal  reduction  in  total  survey  error  would  be 
negligible.  Suppose,  however,  a  list  includes  clusters  where  50-9D  percent 
of  a  special  population  live.  Then,  combined  methods  become  optimum, 
especially  if  the  study  uses  lists  and  face-to-face  screening.  Note  that 
if  one  uses  lists,  the  costs  of  the  other  screening  methods  increase 
since  one  is  now  attempting  to  locate  only  those  clusters  with  a  zero 
probability  of  selection  from  the  list. 
Example  5.1 

Suppose,  following  Example  3.1,  we  want  to  use  a  list  of  subscribers 
to  a  Black  magazine  to  reduce  screening  costs.  As  in  Example  3.1,  let 
ir  =  .12,  t  =  .7,  Cj  =  10,  C$  =  4,  Cj  =  16,  CL  =  .1,  p  =  .05.   (We  assume 
p1  >  .2  so  only  one  start  is  optimum).  The  cost  function  for  this  method  is: 


CK  =  nk  [Cj  +  ±~p-  Csl  +mrci  +  CT  +  CL]      (5.1) 


LI   LT   LL  1-p 
upr   Vr  +  (Jb*)Cc   ° 


(5.2! 


and  CKE  =  ( 


CT  +  f,  -1 


1+Pk)  [Cj  ♦  (M)  cs  +  ^ 


optJ 


[5.3; 


So  k 
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26.10 


10  + 
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12 
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.95 

.05 
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and  CKE  ■  1.25  [10  +  ^(4)  +  ^g^]  =  $28.35 

Assume  that  the  list  is  estimated  to  cover  two-thirds  of  the  clusters 
in  which  the  population  lives  so  that  a  combined  procedure  using  initial 
face-to-face  screening  is  required.  Then  the  new  values  for  it  and  t  for 
this  screening  are  tt  =  .04,  t  =  .9.  The  optimum  k+1  is  21.9  and  the  cost 
per  equivalent  case  is  $88.15.  Therefore,  the  samplinq  rate  for  intitial 
face-to-face  screening  should  be  .57  that  for  list  screening.  A  total 
sample  of  1,000  would  contain  776  cases  from  155  clusters  selected  using 
the  list  and  224  cases  from  10  clusters  using  face-to-face  initial  screenings. 
As  with  all  disproportionate  samples,  weightinq  would  be  required  for 
unbiased  estimates. 

6.  Phone  Screening  and  Face-to-Face  Interviewing 

In  some  situations,  it  is  possible  to  screen  by  telephone,  but  inter- 
viewing must  be  face-to-face.  If  this  procedure  is  followed,  screening 
costs  are  obviously  reduced,  but  interviewer  travel  costs  to  locate  the 
respondent  are  added.  In  an  earlier  paper,  Sudman  showed  that  for  most 
situations,  the  joint  use  of  phone  screening  and  face-to-face  interviewing 
was  more  efficient  than  face-to-face  screening  and  interviewing.   (Sudman, 
1978). 

In  that  paper,  however,  the  procedures  for  screening  zero  segments 
that  are  discussed  above  in  Sections  2  and  3  were  not  used.  When  these  are 
considered,  both  face-to-face  and  ohone  screening  are  more  efficient.  It 
remains  true,  however,  that  for  relatively  rare  populations,  the  joint 
procedure  is  still  much  more  efficient  than  face-to-face  screening. 
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To  see  this,  consider  the  following  cost  function  that  describes  the 
joint  procedure: 

Cpp  ■  m(k+l)[Cj  +  CL  +  Csp  ^~-h   +  m"|csp  +  2  CT]  (6.1) 

(k+D,^  =  t  CSP  +  2   "  CI        .  (6.2) 

opt     (Cj  ?  cL)  ;  csp  (k-t)  (^) 

The  only  new  term  is  C,  which  is  the  cost  of  locatina  an  eligible 
respondent  after  the  phone  screening;  C<,p  is  the  cost  of  screening  a  case 
on  the  telephone.  The  subscript  is  added  as  a  reminder  that  this  is  not 
identical  to  the  cost  of  face-to-face  screening.  It  is  assumed  that 
two  trips  to  the  segment  are  necessary  to  locate  and  interview  all  eligible 
respondents.  This  is  comparable  to  the  assumptions  made  in  section  3. 

Comparing  formulas  (3.1)  and  (6.1),  the  major  tradeoffs  are  between 
travel  costs  to  zero  sites  in  face-to-face  screening  and  location  costs 
in  telephone  screening. 
Example  6.1 

Assume  that  the  face- to- face  screening  procedure  in  Example  3.1  is 
to  be  compared  to  a  joint  procedure.  The  cost  per  eguivalent  case  in  that 
example  is  CpE  =  $50.64. 

For  the  joint  procedure,  use  the  same  estimates  as  in  earlier  examDles: 
t  =  .9,  ir  =  .12,  p  =  .05,  Cj  =  $10,  Csp  ■  $2,  CT  =  $16,  and  let  CL  =  $2. 


Then  fk+n    -  r(-9)(2)  ^  .24(16)  ^A., 
Ihen  (k  ljopt   'l2  .12)  +  2(.l)   .05  •   ~  8J 


'opt  "     12    (.12)   +  2( 

CppE  =   1.355    [10  +  2   +  2(^2")   +  ^~  (2)   +   32]   =   26.38 

871 

which  is  roughly  half  of  the  face-to-face  screening  cost. 
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This  advantage  for  joint  procedures  disappears  for  values  of  -n   equal 
to  or  greater  than  about  .3.  Thus,  the  results  are  similar  to  those 
of  section  2.  Joint  procedures  are  optimum  for  the  same  populations  for 
which  phone  screening  is  optimum. 
7.  Variations  in  Density  of  Special  Populations  in  Non-?ero  Clusters 

He  now  consider  the  situation  where  the  special  Donulation  is  unevenly 
distributed  among  the  non-zero  clusters.  This  would  be  likely  to  occur 
with  ethnic  orouns  where  most  members  would  live  in  a  few  clusters  with 
high  proportions  of  the  population,  but  others  would  be  thinly  spread 
among  the  general  population.  While  it  is  possible  to  have  identified  these 
clusters  from  earlier  screening  or  Census  data,  it  is  also  possible  to 
estimate  the  proportion  of  the  special  population  by  asking  the  first 
contacted  household(s)  to  estimate  ir.. 

Phone  Screening  -  Assume  first  that  the  non-zero  clusters  have  been 
identified  and  catagorized  into  strata  where  ir.  is  the  nrooortion  of  the 
special  population  to  the  total  population  in  that  stratum.    In  this 

case,  no  clustering  is  required  and  the  cost  oer  case  in  the  jth  stratum 

C 
is  C  K  =  CT  +  S  . 

*  i 
An  optimum  allocation  procedure  would  be  to  sample  from  the  strata  with 

rates  inversely  proportional  to  the  square  roots  of  costs.  Thus,  the 

relative  rates  in  strata  A  and  B  would  be: 


LI   LS 


r  +  c 
Li  Ls 


taJ 


(7.1) 
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If  there  is  a  screening  or  list  cost,  the  procedures  of  Section  2 
apply.  Then,  from  formula  2.5 


[1  +  p(kB-l)]   [Cj  +   0-t)^+  tCs/kB-l] 


1- 


b    [n  +  p(kA-i)][Cj 


I  c  <^>+  t 


'S     TT 


/kn-l 


Cs'  "A 


(7.2) 


Example  7.1 

Suppose  one  wishes  to  select  a  sample  of  Hispanics  in  a  community  usinq 
phone  screening.  Ignoring  zero  clusters,  assume  most  members  live  in 
clusters  where  it-,-   .5,  but  a  few  live  in  areas  where  ^  =  .01.  As  before, 

let  C,  =  10,  Cs  =  2 

Then  !l  .PO  +  2/.QVft   3  g 
'  r7      |_1n  +  2/-5  J 


Face-to-Face  Screening 

Face-to-face  screening  would  require  clustering.  Again  assuming 
the  non-zero  clusters  have  been  identified,  the  total  cost  in  the  jth 
stratum  would  be: 


Cc.   ■  mk   [CT  +  —  "I  +  mCT 
Fj  I       irj  T 


and  k 


opt 


T 


Cj  +  CS| 


1-t 


(7.3) 


(7.4) 


The  cost  per  equivalent  case  in  the  jth  stratum  is: 


(7.5) 
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and  ^A  =  (CEB/CEA)^ 

If  the  non-zero  clusters  had  not  been  previously  identified,  ratios  of  the 
cost  functions  in  Section  3  could  be  computed. 
Example  7.2 

Assume  one  wishes  a  sample  of  Hispanics,  but  phone  ownership  in  the 
community  is  too  low  so  that  face-to- face  screening  is  required.  As  in 
Example  7.1,  assume  zero  clusters  have  already  been  located,  let  ir,  =  .5, 
tt2  =  .01,  Cj  =  10,  Cs  =  2,  p  =  .05  and  let  CT  =  32  to  cover  the  costs  of 
two  trips  to  the  cluster.  Then: 

klopt=6'6  k2oPt=1-7 

CF     =  $24.13  Cc     =  $236.83 

Ll  h2 

and  -±  =   (9.8)2  =  3.1 
r2 

Sudman  (1972)  discussed  the  problem  earlier  for  very  rare  populations 
where  zero  clusters  had  not  been  identified.  In  that  situation,  it  was 
sometimes  optimum  either  from  a  Bayesian  or  a  total  survey  error  per- 
spective to  omit  strata  with  very   few  members  of  the  special  population. 
The  same  would  be  the  case  here  if  tt .  is  much  less  than  .01  and  the  jth 
stratum  contains  a  small  fraction  of  the  total  special  population. 

Procedures  for  variable  sampling  rates  reauire  weighting  and  introduce 
administrative  complexities.  They  are  worthwhile,  however,  to  reduce  the 
very   high  screening  costs  associated  with  locating  isolated  members  of 
special  populations. 
8.  Non-Clustered  Populations 

For  special  populations  that  are  not  geographically  clustered  the 
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network  procedures  that  have  been  developed  by  Sirken  and  Nathan  can  be 
generalized  (Nathan,  1976;  Sirken,  1970,  81).  As  initially  used,  these 

0 

procedures  improved  estimates  of  births  and  deaths  by  obtaininq  information 
from  a  respondent,  not  only  about  members  in  the  household,  but  also  about 
close  relatives  (sons  and  daughters,  brothers  and  sisters)  who  lived 
outside  the  house.  The  procedure  permitted  the  computation  of  multiplicity 
weights  to  account  for  the  fact  that  there  were  known  probabilities  that 
the  same  birth  or  death  could  be  reported  in  several  households. 

The  direct  extension  is  simply  to  ask  for  the  name  and  address  of 
members  of  a  special  population  using  a  fixed  inclusion  rule.  Thus,  in 
two  separate  recent  examples,  relatives  have  been  asked  to  identify  Viet 
Nam  War  veterans  and  cancer  patients.  The  probability  of  a  person  being 
identified  is  proportional  to  the  number  of  households  which  contain  a 
person  who  can  identify  the  member  of  the  special  population. 

While  the  theory  of  this  procedure  is  well  developed,  it  is  limited 
by  possible  response  errors.  The  empirical  question  is  whether  respondents 
can  report  with  reasonable  accuracy  about  the  characteristics  or  behavior 
of  prominated  persons.  The  empirical  data  reported  by  Nathan  (1976)  and 
by  Rothbart,  Fine  and  Sudman  (1981)  does  indicate  that  individuals  can 
reDort  well  about  close  relatives  such  as  children  and  siblings,  but  with 
lower  levels  of  accuracy  about  nephews. 

In  theory,  there  is  no  reason  this  procedure  could  not  be  expanded  to 
larger  networks  such  as  neighbors,  friends,  co-workers  or  members  of  an 
organization.   It  would  be  necessary,  however,  to  develoD  and  test  procedures 
for  specifying  who  is  to  be  included  or  excluded,  and  to  be  able  to 
estimate  response  errors. 
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Example  8.1 

Suppose  one  wished  to  identify  a  sample  of  persons  who  are  missing 
one  or  more  limbs  so  that  interviews  could  be  conducted  with  them.  The 
following  series  of  location  guestion  might  be  considered: 

Conventional :  Is  there  anyone  in  this  household  who  is  missing  one 
or  more  limbs? 

Close  relatives:  How  many  sons  or  daughters  do  you  and  your  spouse 
have  living  away  from  home?  How  many  brothers  or  sisters  do  you  and  your 
spouse  have  living  away  from  home? 

Do  any  of  your  children  have  any  missing  limbs? 

Do  any  of  your  or  your  spouses  brothers  or  sisters  have  any  missing 
limbs? 

Distant  relatives:  How  many  nieces  and  nephews  do  you  have,  whom  you 
keep  in  touch  with,  at  least  once  in  a  while? 

How  many  aunts  or  uncles  do  you  have,  whom  you  keep  in  touch  with, 
at  least  once  in  a  while? 

How  many  cousins  do  you  have,  whom  you  keep  in  touch  with,  at  least 
once  in  a  while? 

Do  any  of  your  nieces  of  nephews  have  a  missing  limb? 

Do  any  of  your  aunts  or  uncles  have  a  missing  limb? 

Do  any  of  your  cousins  have  a  missing  limb? 

Neighbors:  About  how  many  neighbors  living  * 

would  you  recognize  if  you  met  them? 
*  Multi listing  buildings  :  in  the  buildinq 
Single  family  -  urban: on  this  block 
-  rural  :  around  here. 
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Do  any  of  your  neighbors  have  any  missing  limbs? 

Co- Workers:  About  how  many  other  people  work  with  you  in  your 
department  (group,  unit,  etc.)?  Do  any  of  your  co-workers  have  a  missing 
limb? 

It  is  evident  from  the  example  that  different  response  errors,  of 
greater  or  lesser  seriousness,  are  possible.  The  error  of  misclassifyinq 
a  person  as  falling  into  the  special  population  when  he  or  she  did  not 
would  be  corrected  when  that  person  was  contacted  directly.  The  reverse 
error  of  not  including  a  person  in  the  special  population  when  the  person 
belongs  there  cannot  be  corrected  and  leads  to  an  overstatement  of  the 
probability  of  selection,  and  thus  to  a  sample  bias.  This  undercoverage 
bias  depends  both  on  the  topic  and  the  closeness  of  the  acquaintance  between 
the  nominator  and  the  nominees. 

One  potential  problem,  locating  a  nominated  respondent,  aopears  to 

require  only  modest  effort.  Even  if  the  nominator  does  not  have  an 

exact  address  (or  even  name)  it  is  usually  possible  to  enlist  intermediaries 

who  are  also  in  the  network.  Thus,  when  an  aunt  did  net  know  the  address 

of  a  nephew  who  was  a  Viet  Nam  war  veteran,  she  did  know  the  address  or 

phone  of  her  sister  who  did  know  her  son's  address.   (Rothbart,  Fine  and 
i 

Sudman,  198(5). 

Potentially  the  most  serious  and  least  understood  response  error, 
except  for  close  relatives,  is  in  the  estimate  of  the  size  of  the  network. 
This  estimate  can  be  obtained  from  either  the  nominator  or  the  nominee. 
Almost  certainly  the  absolute  error  is  a  function  of  network  size,  which 
would  limit  the  use  of  very  large  networks.  Further  research  is  needed  to 
determine  the  size  and  directions  of  response  errors  in  estimating  network 
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size,  as  well  as  the  possible  demographic  and  social  psychological  correlates 
of  these  response  errors. 

Nevertheless,  as  the  Viet  Nam  veteran  study  demonstrated,  it  is 
possible  to  reduce  screenings  in  half  by  using  relatives,  and  even  larger 
reduction  in  cost  are  possible  for  larger  networks.  For  rare  special 
populations,  total  survey  error  may  be  minimized  by  using  fairly  large 
networks,  even  in  presence  of  response  errors. 
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